11. SHAP importance

  • motivating case study

  • game theoretic motivation

  • translation to ML

  • visualization

Learning outcomes

  1. Describe the theoretical foundation of post-hoc explanation methods like SHAP and linear probes values and apply them to realistic case studies with appropriate validation checks
  2. Within a specific application context, evaluate the trade-offs associated with competing interpretable machine learning techniques.

Motivating Case Study

Enhancers are stretches of the genome that coordinate the expression of many downstream genes.

  • Their status is determined by combinations of binding transcription factors and chromatin marks

Understanding these interactions explains genotype \(\to\) phenotype.

Motivating Case Study

This can be studied using developing fruit flies.

(source)

Question

Which features drive enhancer activity in each sequence?

Data

Genome-wide measurements from blastoderm (stage 5) Drosophila embryos:

  • DNA occupancy for 23 transcription factors
  • Activity for 13 chromatin markers
  • Binary labels indicating enhancer activity for genomic regions

Each observation: a genomic sequence with associated regulatory features.

Statistical Formulation

Enhancer status: \(y \in \{0,1\}\)

Predictors: \(x = (x_1, \ldots, x_D)\) (TF binding intensities, chromatin signals)

Model: \(f(x_i) = \mathbb{P}(y_i = 1 \mid x_i)\)

Goal. Quantify each feature’s contribution to \(f(x_i)\).

Game Theory

Credit Assignment

How to distribute profit across employees \(i\) in a company if any team \(S\) has profit \(v(S)\)?


Game theoretic analogy.

Shapley Credit Assignment

Employee \(i\)’s credit: average marginal contribution \(v(S) - v(S \setminus \{i\})\) over all teams \(S \ni i\).

\[\begin{align} \varphi(i) = \frac{1}{D} \sum_{d = 1}^{D} \frac{1}{\binom{D-1}{d-1}}\sum_{S \in S_{d}(i)} [v(S) - v(S \setminus \{i\})] \end{align}\]

where \(S_{d}(i)\) collects subsets of size \(d\) containing \(i\).

Axioms

Shapley values are the unique credit assignment satisfying:

Symmetry. Equal marginal contributions \(\to\) equal credit. \[\begin{align*} \varphi(i) = \varphi(j) \;\text{ if }\; v(S \cup \{i\}) - v(S) = v(S \cup \{j\}) - v(S) \;\; \forall S \end{align*}\]

Dummy. Zero marginal contribution \(\to\) zero credit. \[\begin{align*} \varphi(i) = 0 \;\text{ if }\; v(S \cup \{i\}) = v(S) \;\; \forall S \end{align*}\]

Additivity. For two games \(v_1, v_2\), attributions add. \[\begin{align*} \varphi_i(v_1 + v_2) = \varphi_i(v_1) + \varphi_i(v_2) \end{align*}\]

Efficiency

Efficiency. All credit is distributed, and no double counting.

\[\sum_{i=1}^D \varphi(i) = v(D) - v(\emptyset)\]

Feature Attribution

Game Theory \(\to\) Machine Learning

For each prediction \(f(x)\), define a game:

\[\begin{align} v_{x}(S) = \mathbb{E}_{p(x'_{S^C} \mid x_{S})}[f(x_{S}, x'_{S^C})] \end{align}\]

where \(x_S\) denotes coordinates in \(S\) fixed at their observed values, and \(x'_{S^C}\) are the remaining coordinates drawn from their conditional distribution.

How to define \(v(S)\)? This determines what “importance” means.

Shapley Feature Attribution

Feature \(i\)’s contribution to \(f(x)\):

\[\begin{align} \varphi_{x}(f, i) = \frac{1}{D} \sum_{d = 1}^{D} \frac{1}{\binom{D-1}{d-1}}\sum_{S \in S_{d}(i)}[v_{x}(S) - v_{x}(S \setminus \{i\})] \end{align}\]

These satisfy: \[\sum_{i=1}^D \varphi_{x}(f, i) = f(x) - \mathbb{E}[f(X)]\]

Each \(\varphi_x(f,i)\) explains deviation from baseline prediction.

Geometric Interpretation

\[\begin{align} v_{x}(S) = \mathbb{E}_{p(x'_{S^C} \mid x_{S})}[f(x_{S}, x'_{S^C})] \end{align}\]

.center[ ]

Conditional expectation when features \(S\) are fixed at \(x_S\).

Exercise: SHAP Visual Explanation

Respond to [SHAP Visual Explanation] in the exercise sheet.

Main Difficulty

The conditional expectation \(\mathbb{E}[f(x_S, X_{S^C}) \mid x_S]\) is generally not computable.

This forces a choice:

  1. Marginal Shapley. Replace with \(\mathbb{E}[f(x_S, X_{S^C})]\)
    • Ignores feature correlations
    • Reveals model’s functional form
  2. Conditional Shapley. Model \(p(x_{S^C} \mid x_S)\)
    • Respects correlations
    • Hard to estimate, often biased

We discuss this further next week.

Visualization - One Sample

Attributions sum to \(f(x)\): visualize as stacked bar centered at prediction.

Visualization - Many Samples

Compact visualization identifies samples with similar explanations.